Robust Language Pair-Independent Sub-Tree Alignment
نویسندگان
چکیده
Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as ExampleBased MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error-prone, and requires expert knowledge of both source and target languages. We propose a novel, language pair-independent algorithm which automatically induces alignments between phrase-structure trees. We evaluate the alignments themselves against a manually aligned gold standard, and perform an extrinsic evaluation by using the aligned data to train and test a DOT system. Our results show that translation accuracy is comparable to that of the same translation system trained on manually aligned data, and coverage improves.
منابع مشابه
Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features. Th...
متن کاملRobust Sub-Sentential Alignment of Phrase-Structure Trees
Data-Oriented Translation (DOT), based on DataOriented Parsing (DOP), is a language-independent MT engine which exploits parsed, aligned bitexts to produce very high quality translations. However, data acquisition constitutes a serious bottleneck as DOT requires parsed sentences aligned at both sentential and sub-structural levels. Manual substructural alignment is time-consuming, error-prone a...
متن کاملLanguage-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus
We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three dif...
متن کاملDeep Syntactic Structures for String-to-Tree Translation
1.1 String-to-tree translation A state-of-the-art syntax-based Statistical Machine Translation (SMT) model, string-to-tree translation model (Galley et al., 2004; Galley et al., 2006; Chiang et al., 2009), is to construct a number of parse trees of the target language by ‘parsing’ a source language sentence making use of a bilingual translation grammar. Given a set of parallel sentences for tra...
متن کاملDiscriminative Induction of Sub-Tree Alignment using Limited Labeled Data
We employ Maximum Entropy model to conduct sub-tree alignment between bilingual phrasal structure trees. Various lexical and structural knowledge is explored to measure the syntactic similarity across Chinese-English bilingual tree pairs. In the experiment, we evaluate the sub-tree alignment using both gold standard tree bank and the automatically parsed corpus with manually annotated sub-tree ...
متن کامل